Search CORE

43 research outputs found

Join-Idle-Queue with Service Elasticity: Large-Scale Asymptotics of a Non-monotone System

Author: Mukherjee Debankur
Stolyar Alexander
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 01/03/2018
Field of study

We consider the model of a token-based joint auto-scaling and load balancing strategy, proposed in a recent paper by Mukherjee, Dhara, Borst, and van Leeuwaarden (SIGMETRICS '17, arXiv:1703.08373), which offers an efficient scalable implementation and yet achieves asymptotically optimal steady-state delay performance and energy consumption as the number of servers

N\to\infty

. In the above work, the asymptotic results are obtained under the assumption that the queues have fixed-size finite buffers, and therefore the fundamental question of stability of the proposed scheme with infinite buffers was left open. In this paper, we address this fundamental stability question. The system stability under the usual subcritical load assumption is not automatic. Moreover, the stability may not even hold for all

N

. The key challenge stems from the fact that the process lacks monotonicity, which has been the powerful primary tool for establishing stability in load balancing models. We develop a novel method to prove that the subcritically loaded system is stable for large enough

N

, and establish convergence of steady-state distributions to the optimal one, as

N \to \infty

. The method goes beyond the state of the art techniques -- it uses an induction-based idea and a "weak monotonicity" property of the model; this technique is of independent interest and may have broader applicability.Comment: 30 page

arXiv.org e-Print Archive

Pure OAI Repository

Scalable Load Balancing Algorithms in Networked Systems

Author: Mukherjee Debankur
Publication venue
Publication date: 01/01/2018
Field of study

A fundamental challenge in large-scale networked systems viz., data centers and cloud networks is to distribute tasks to a pool of servers, using minimal instantaneous state information, while providing excellent delay performance. In this thesis we design and analyze load balancing algorithms that aim to achieve a highly efficient distribution of tasks, optimize server utilization, and minimize communication overhead.Comment: Ph.D. thesi

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Supermarket Model on Graphs

Author: Budhiraja Amarjit
Mukherjee Debankur
Wu Ruoyu
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2017
Field of study

We consider a variation of the supermarket model in which the servers can communicate with their neighbors and where the neighborhood relationships are described in terms of a suitable graph. Tasks with unit-exponential service time distributions arrive at each vertex as independent Poisson processes with rate

\lambda

, and each task is irrevocably assigned to the shortest queue among the one it first appears and its

d-1

randomly selected neighbors. This model has been extensively studied when the underlying graph is a clique in which case it reduces to the well known power-of-

d

scheme. In particular, results of Mitzenmacher (1996) and Vvedenskaya et al. (1996) show that as the size of the clique gets large, the occupancy process associated with the queue-lengths at the various servers converges to a deterministic limit described by an infinite system of ordinary differential equations (ODE). In this work, we consider settings where the underlying graph need not be a clique and is allowed to be suitably sparse. We show that if the minimum degree approaches infinity (however slowly) as the number of servers

N

approaches infinity, and the ratio between the maximum degree and the minimum degree in each connected component approaches 1 uniformly, the occupancy process converges to the same system of ODE as the classical supermarket model. In particular, the asymptotic behavior of the occupancy process is insensitive to the precise network topology. We also study the case where the graph sequence is random, with the

N

-th graph given as an Erd\H{o}s-R\'enyi random graph on

N

vertices with average degree

c(N)

. Annealed convergence of the occupancy process to the same deterministic limit is established under the condition

c(N)\to\infty

, and under a stronger condition

c(N)/\ln N\to\infty

, convergence (in probability) is shown for almost every realization of the random graph.Comment: 32 page

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Phase transitions of extremal cuts for the configuration model

Author: Dhara Souvik
Mukherjee Debankur
Sen Subhabrata
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2016
Field of study

The

k

-section width and the Max-Cut for the configuration model are shown to exhibit phase transitions according to the values of certain parameters of the asymptotic degree distribution. These transitions mirror those observed on Erd\H{o}s-R\'enyi random graphs, established by Luczak and McDiarmid (2001), and Coppersmith et al. (2004), respectively

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Optimal Rate-Matrix Pruning For Large-Scale Heterogeneous Systems

Author: Mukherjee Debankur
Zhao Zhisheng
Publication venue
Publication date: 31/05/2023
Field of study

We present an analysis of large-scale load balancing systems, where the processing time distribution of tasks depends on both the task and server types. Our study focuses on the asymptotic regime, where the number of servers and task types tend to infinity in proportion. In heterogeneous environments, commonly used load balancing policies such as Join Fastest Idle Queue and Join Fastest Shortest Queue exhibit poor performance and even shrink the stability region. Interestingly, prior to this work, finding a scalable policy with a provable performance guarantee in this setup remained an open question. To address this gap, we propose and analyze two asymptotically delay-optimal dynamic load balancing policies. The first policy efficiently reserves the processing capacity of each server for ``good" tasks and routes tasks using the vanilla Join Idle Queue policy. The second policy, called the speed-priority policy, significantly increases the likelihood of assigning tasks to the respective ``good" servers capable of processing them at high speeds. By leveraging a framework inspired by the graphon literature and employing the mean-field method and stochastic coupling arguments, we demonstrate that both policies achieve asymptotic zero queuing. Specifically, as the system scales, the probability of a typical task being assigned to an idle server approaches 1

arXiv.org e-Print Archive

Large deviations analysis for the $M/H_2/n + M$ queue in the Halfin-Whitt regime

Author: Goldberg David A.
Li Yuan
Mukherjee Debankur
Publication venue
Publication date: 01/01/2018
Field of study

We consider the FCFS

M/H_2/n + M

queue in the Halfin-Whitt heavy traffic regime. It is known that the normalized sequence of steady-state queue length distributions is tight and converges weakly to a limiting random variable W. However, those works only describe W implicitly as the invariant measure of a complicated diffusion. Although it was proven by Gamarnik and Stolyar that the tail of W is sub-Gaussian, the actual value of

\lim_{x \rightarrow \infty}x^{-2}\log(P(W >x))

was left open. In subsequent work, Dai and He conjectured an explicit form for this exponent, which was insensitive to the higher moments of the service distribution. We explicitly compute the true large deviations exponent for W when the abandonment rate is less than the minimum service rate, the first such result for non-Markovian queues with abandonments. Interestingly, our results resolve the conjecture of Dai and He in the negative. Our main approach is to extend the stochastic comparison framework of Gamarnik and Goldberg to the setting of abandonments, requiring several novel and non-trivial contributions. Our approach sheds light on several novel ways to think about multi-server queues with abandonments in the Halfin-Whitt regime, which should hold in considerable generality and provide new tools for analyzing these systems

arXiv.org e-Print Archive

Repository TU/e

Join-the-Shortest Queue Diffusion Limit in Halfin-Whitt Regime: Tail Asymptotics and Scaling of Extrema

Author: Banerjee Sayan
Mukherjee Debankur
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2018
Field of study

Consider a system of

N

parallel single-server queues with unit-exponential service time distribution and a single dispatcher where tasks arrive as a Poisson process of rate

\lambda(N)

. When a task arrives, the dispatcher assigns it to one of the servers according to the Join-the-Shortest Queue (JSQ) policy. Eschenfeldt and Gamarnik (2015) established that in the Halfin-Whitt regime where

(N-\lambda(N))/\sqrt{N}\to\beta>0

N\to\infty

, appropriately scaled occupancy measure of the system under the JSQ policy converges weakly on any finite time interval to a certain diffusion process as

N\to\infty

. Recently, it was further established by Braverman (2018) that the stationary occupancy measure of the system converges weakly to the steady state of the diffusion process as

N\to\infty

. In this paper we perform a detailed analysis of the steady state of the above diffusion process. Specifically, we establish precise tail-asymptotics of the stationary distribution and scaling of extrema of the process on large time-interval. Our results imply that the asymptotic steady-state scaled number of servers with queue length two or larger exhibits an Exponential tail, whereas that for the number of idle servers turns out to be Gaussian. From the methodological point of view, the diffusion process under consideration goes beyond the state-of-the-art techniques in the study of the steady-state of diffusion processes. Lack of any closed form expression for the steady state and intricate interdependency of the process dynamics on its local times make the analysis significantly challenging. We develop a technique involving the theory of regenerative processes that provides a tractable form for the stationary measure, and in conjunction with several sharp hitting time estimates, acts as a key vehicle in establishing the results.Comment: 41 pages; To appear in the Annals of Applied Probabilit

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Distributed Rate Scaling in Large-Scale Service Systems

Author: Mukherjee Debankur
Rutten Daan
Zubeldia Martin
Publication venue
Publication date: 03/06/2023
Field of study

We consider a large-scale parallel-server system, where each server independently adjusts its processing speed in a decentralized manner. The objective is to minimize the overall cost, which comprises the average cost of maintaining the servers' processing speeds and a non-decreasing function of the tasks' sojourn times. The problem is compounded by the lack of knowledge of the task arrival rate and the absence of a centralized control or communication among the servers. We draw on ideas from stochastic approximation and present a novel rate scaling algorithm that ensures convergence of all server processing speeds to the globally asymptotically optimum value as the system size increases. Apart from the algorithm design, a key contribution of our approach lies in demonstrating how concepts from the stochastic approximation literature can be leveraged to effectively tackle learning problems in large-scale, distributed systems. En route, we also analyze the performance of a fully heterogeneous parallel-server system, where each server has a distinct processing speed, which might be of independent interest.Comment: 32 pages, 4 figure

arXiv.org e-Print Archive

Asymptotically Optimal Load Balancing Topologies

Author: Borst Sem C.
Mukherjee Debankur
van Leeuwaarden Johan S. H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

We consider a system of

N

servers inter-connected by some underlying graph topology

G_N

. Tasks arrive at the various servers as independent Poisson processes of rate

\lambda

. Each incoming task is irrevocably assigned to whichever server has the smallest number of tasks among the one where it appears and its neighbors in

G_N

. Tasks have unit-mean exponential service times and leave the system upon service completion. The above model has been extensively investigated in the case

G_N

is a clique. Since the servers are exchangeable in that case, the queue length process is quite tractable, and it has been proved that for any

\lambda < 1

, the fraction of servers with two or more tasks vanishes in the limit as

N \to \infty

. For an arbitrary graph

G_N

, the lack of exchangeability severely complicates the analysis, and the queue length process tends to be worse than for a clique. Accordingly, a graph

G_N

is said to be

N

-optimal or

\sqrt{N}

-optimal when the occupancy process on

G_N

is equivalent to that on a clique on an

N

-scale or

\sqrt{N}

-scale, respectively. We prove that if

G_N

is an Erd\H{o}s-R\'enyi random graph with average degree

d(N)

, then it is with high probability

N

-optimal and

\sqrt{N}

-optimal if

d(N) \to \infty

and

d(N) / (\sqrt{N} \log(N)) \to \infty

N \to \infty

, respectively. This demonstrates that optimality can be maintained at

N

-scale and

\sqrt{N}

-scale while reducing the number of connections by nearly a factor

N

and

\sqrt{N} / \log(N)

compared to a clique, provided the topology is suitably random. It is further shown that if

G_N

contains

\Theta(N)

bounded-degree nodes, then it cannot be

N

-optimal. In addition, we establish that an arbitrary graph

G_N

N

-optimal when its minimum degree is

N - o(N)

, and may not be

N

-optimal even when its minimum degree is

c N + o(N)

for any

0 < c < 1/2

.Comment: A few relevant results from arXiv:1612.00723 are included for convenienc

arXiv.org e-Print Archive

Crossref

Repository TU/e

Pure OAI Repository